v1.0 MVP Solo Founder 6 Weeks

Technical Architecture

Production-ready SaaS stack for ModelPilot AI — a multi-tenant no-code AI chatbot platform. Optimized for solo founder velocity with clean abstractions that scale.

Tech Stack

⚡

Next.js 14

Frontend · App Router · TypeScript

🐍

FastAPI

Backend · Python · Async

🗄️

Supabase

PostgreSQL · Auth · Storage · Realtime

🔮

Qdrant

Vector DB · RAG · Semantic Search

🔀

LiteLLM

AI Gateway · Multi-provider routing

🔴

Redis + Celery

Task queue · Document processing

🚂

Railway

Backend hosting · Auto-deploy

▲

Vercel

Frontend hosting · Edge network

What NOT to Build in MVP

❌ Skip in MVP

Custom model fine-tuning

Mobile app (iOS/Android)

Real-time collaboration

Advanced analytics dashboards

Zapier / Slack integrations

Custom LLM hosting

White-label platform

✓ Ship in MVP

Multi-provider AI routing

PDF/URL knowledge ingestion

Embeddable chat widget

Multi-tenant auth

Usage + cost tracking

Basic conversation logs

Stripe billing

6-Week Sprint Plan

Sequenced for a solo founder. Each week ships something live. Ship early, iterate fast.

Foundation

Project scaffolding + Auth + Multi-tenancy

Init Next.js 14 (App Router) + FastAPI monorepo

Supabase project: Auth (email + Google OAuth), RLS policies

PostgreSQL schema: workspaces, users, workspace_members

JWT middleware on FastAPI, tenant context injection

Deploy skeleton to Vercel + Railway

Basic dashboard shell UI (sidebar, routing)

Core AI

LiteLLM gateway + Chat endpoint + Chatbot builder

LiteLLM proxy setup with OpenAI, Anthropic, Gemini

POST /chat endpoint with SSE streaming

Chatbot CRUD API + DB tables

AI provider key management (encrypted storage)

Chatbot builder UI (name, model, system prompt)

Basic conversation logging to PostgreSQL

Knowledge

RAG pipeline + Document ingestion + Qdrant

Qdrant Cloud setup, collection per workspace

Celery + Redis for async document processing

PDF/DOCX text extraction (pdfplumber, python-docx)

URL scraping (BeautifulSoup + trafilatura)

Chunking + embedding pipeline (text-embedding-3-small)

RAG retrieval injected into chat context

Knowledge base UI (upload, status, list)

Widget

Embeddable chat widget + Public chat API

Public /widget/:botId endpoint (no auth, CORS open)

Vanilla JS widget (zero dependencies, <8KB gzipped)

CDN hosting (Cloudflare R2 or Supabase Storage)

Widget customization: colors, greeting, position

Embed code generator UI

Rate limiting per botId (Redis sliding window)

Billing

Stripe billing + Usage tracking + Limits

Stripe Checkout + webhook handler (subscription CRUD)

Usage metering: messages, tokens, cost per workspace

Plan enforcement middleware (message caps, bot limits)

Pricing page + upgrade flow UI

Billing portal redirect (Stripe Customer Portal)

Email alerts: usage at 80% / 100% cap

Launch

Polish + Onboarding + Analytics + Launch

Onboarding flow (workspace → provider → bot → widget)

Dashboard analytics (charts, cost, conversation logs)

Conversation log viewer with transcript export

Team invites (email-based, role assignment)

Error handling, empty states, loading states everywhere

PostHog analytics + Sentry error tracking

Product Hunt / launch prep, pricing live

Database Schema

PostgreSQL via Supabase. Multi-tenant using workspace_id on every table + Row Level Security (RLS). All timestamps in UTC.

Core Tables

workspaces— Top-level tenant

iduuidPKPrimary key, default gen_random_uuid()

nametextWorkspace display name

slugtextURL-safe identifier, unique

plantext'starter' | 'pro' | 'enterprise'

stripe_customer_idtextStripe customer reference

message_quotaintegerMonthly message limit (5000/50000/∞)

created_attimestamptzdefault now()

workspace_members— User ↔ Workspace join

user_iduuidFK→ auth.users.id

workspace_iduuidFK→ workspaces.id

roletext'admin' | 'editor' | 'viewer'

invited_byuuidFK→ auth.users.id (nullable)

joined_attimestamptznull = pending invite

chatbots— Bot configuration

iduuidPKPrimary key

workspace_iduuidFK→ workspaces.id (RLS partition)

nametextDisplay name e.g. "Support Bot"

system_prompttextLLM system instruction

modeltext'gpt-4o' | 'claude-3-5-sonnet' | ...

provider_iduuidFK→ ai_providers.id

temperaturefloat4default 0.7

max_tokensintegerdefault 1024

widget_configjsonb{color, greeting, position, avatar}

statustext'draft' | 'live' | 'paused'

created_attimestamptzdefault now()

ai_providers— Encrypted API keys per workspace

iduuidPK

workspace_iduuidFK→ workspaces.id

providertext'openai' | 'anthropic' | 'google' | 'groq'

api_key_enctextAES-256 encrypted, never returned to client

monthly_budgetnumericUSD cap, null = unlimited

rate_limit_rpmintegerRequests per minute cap

is_activebooleandefault true

knowledge_documents— Indexed source files

iduuidPK

workspace_iduuidFK→ workspaces.id

chatbot_iduuidFK→ chatbots.id (nullable = global)

source_typetext'pdf' | 'url' | 'docx' | 'faq' | 'txt'

source_urltextStorage URL or scraped URL

filenametextOriginal filename

chunk_countintegerVectors stored in Qdrant

statustext'pending' | 'processing' | 'indexed' | 'error'

error_msgtextnullable, set on error

created_attimestamptz

conversations— Chat session headers

iduuidPK

chatbot_iduuidFK→ chatbots.id

workspace_iduuidFK→ workspaces.id (denormalized for RLS)

session_idtextClient-generated UUID (widget visitor)

user_identifiertextEmail or anonymous ID from widget

model_usedtexte.g. 'gpt-4o'

total_tokensintegerAccumulated across all messages

total_cost_usdnumericRunning cost in USD

statustext'active' | 'resolved' | 'handoff'

started_attimestamptz

ended_attimestamptznullable

messages— Individual chat turns

iduuidPK

conversation_iduuidFK→ conversations.id

roletext'user' | 'assistant' | 'system'

contenttextMessage text

tokensintegerToken count for this message

sourcesjsonbRAG chunks used [{doc_id, score, excerpt}]

latency_msintegerTime to first token (ms)

created_attimestamptz

usage_events— Append-only metering log

idbigserialPK

workspace_iduuidFK

event_typetext'chat_message' | 'doc_indexed' | 'widget_load'

tokens_usedintegernullable

cost_usdnumeric(10,6)nullable

providertextnullable

modeltextnullable

created_attimestamptzdefault now()

Folder Structure

Monorepo with apps/web (Next.js) and apps/api (FastAPI). Shared types via packages/types.

bash

# Root monorepo
modelpilot/
├── apps/
│   ├── web/                        # Next.js 14 frontend
│   │   ├── app/
│   │   │   ├── (auth)/             # Login, signup, onboarding
│   │   │   ├── (dashboard)/        # Authenticated app shell
│   │   │   │   ├── layout.tsx      # Sidebar + topbar wrapper
│   │   │   │   ├── page.tsx        # Dashboard
│   │   │   │   ├── chatbots/
│   │   │   │   ├── knowledge/
│   │   │   │   ├── providers/
│   │   │   │   ├── logs/
│   │   │   │   ├── widget/
│   │   │   │   ├── team/
│   │   │   │   └── pricing/
│   │   │   └── api/                # Next.js API routes (thin proxies)
│   │   ├── components/
│   │   │   ├── ui/                 # Button, Input, Badge, Modal...
│   │   │   ├── chatbot/            # BotCard, BotEditor, ChatPreview
│   │   │   ├── knowledge/          # UploadZone, DocTable, FAQEditor
│   │   │   └── widget/             # WidgetPreview, EmbedCode
│   │   ├── lib/
│   │   │   ├── api.ts              # Typed fetch wrapper
│   │   │   ├── supabase.ts         # Supabase client
│   │   │   └── hooks/              # useWorkspace, useChatbots, ...
│   │   └── middleware.ts           # Auth guard + workspace redirect
│   │
│   └── api/                        # FastAPI backend
│       ├── main.py                 # App entry, middleware, routers
│       ├── routers/
│       │   ├── chat.py             # POST /chat (SSE stream)
│       │   ├── chatbots.py         # CRUD /chatbots
│       │   ├── knowledge.py        # Upload, list, delete
│       │   ├── providers.py        # API key management
│       │   ├── widget.py           # Public widget endpoint
│       │   ├── analytics.py        # Usage, cost, logs
│       │   └── billing.py          # Stripe webhooks
│       ├── services/
│       │   ├── llm.py              # LiteLLM wrapper
│       │   ├── rag.py              # Qdrant search + context build
│       │   ├── ingestion.py        # Chunking + embedding pipeline
│       │   ├── billing.py          # Stripe + usage metering
│       │   └── encryption.py       # AES-256 for API keys
│       ├── workers/
│       │   └── tasks.py            # Celery tasks (doc processing)
│       ├── models/
│       │   └── schemas.py          # Pydantic request/response models
│       ├── middleware/
│       │   ├── auth.py             # JWT verify + tenant inject
│       │   └── rate_limit.py       # Redis sliding window
│       └── db/
│           ├── client.py           # Supabase + asyncpg connection
│           └── queries.py          # Raw SQL helpers
│
├── packages/
│   └── types/                      # Shared TS types
│
├── widget/                         # Embeddable widget (vanilla JS)
│   ├── src/widget.ts
│   ├── dist/widget.js              # Built, hosted on CDN
│   └── rollup.config.js
│
├── docker-compose.yml              # Redis + Qdrant local dev
└── .env.example

API Routes

REST API on FastAPI. Base URL: https://api.modelpilot.ai/v1. All routes require Authorization: Bearer <jwt> except widget endpoints.

Chatbots

GET/chatbotsList all chatbots for workspace

POST/chatbotsCreate a new chatbot

GET/chatbots/:idGet chatbot by ID

PUT/chatbots/:idUpdate chatbot config

DELETE/chatbots/:idDelete chatbot + all data

Chat

POST/chatSend message, returns SSE stream (authenticated)

POST/widget/:botId/chatPublic widget chat — no auth, rate limited by IP

GET/widget/:botId/configPublic widget config (colors, greeting, etc.)

Knowledge

GET/knowledgeList documents (filter by chatbot)

POST/knowledge/uploadUpload file (multipart), enqueues processing

POST/knowledge/scrapeScrape URL, enqueues processing

DELETE/knowledge/:idDelete doc + Qdrant vectors

GET/knowledge/:id/statusPoll processing status

Analytics

GET/analytics/overviewKPIs: messages, cost, users (range param)

GET/analytics/usageTime series: tokens + cost per day

GET/conversationsList conversations (filter, paginate)

GET/conversations/:id/messagesFull message transcript

Chat Endpoint

SSE streaming endpoint. RAG context injected before LLM call. Token usage tracked in real-time and written async to DB.

FastAPI — apps/api/routers/chat.py

python

from fastapi import APIRouter, Depends, HTTPException
from fastapi.responses import StreamingResponse
from pydantic import BaseModel
from typing import AsyncIterator
import json, time

from ..middleware.auth import get_current_workspace
from ..services.llm import stream_chat
from ..services.rag import retrieve_context
from ..db.client import db

router = APIRouter(prefix="/chat", tags=["chat"])

class ChatRequest(BaseModel):
    chatbot_id: str
    session_id: str
    message: str
    history: list[dict] = []   # [{role, content}, ...]

@router.post("")
async def chat(
    req: ChatRequest,
    workspace = Depends(get_current_workspace)
):
    # 1. Load chatbot config
    bot = await db.fetchrow(
        "SELECT * FROM chatbots WHERE id=$1 AND workspace_id=$2",
        req.chatbot_id, workspace.id
    )
    if not bot:
        raise HTTPException(404, "Chatbot not found")

    # 2. RAG: retrieve relevant context
    context_chunks = await retrieve_context(
        query=req.message,
        chatbot_id=req.chatbot_id,
        workspace_id=workspace.id,
        top_k=5
    )

    # 3. Build messages array
    system = bot["system_prompt"]
    if context_chunks:
        ctx_text = "\n\n".join(c["text"] for c in context_chunks)
        system += f"\n\n--- KNOWLEDGE BASE ---\n{ctx_text}"

    messages = [
        {"role": "system", "content": system},
        *req.history,
        {"role": "user", "content": req.message}
    ]

    # 4. Stream response
    start_ms = time.time()

    async def event_stream() -> AsyncIterator[str]:
        full_text = ""
        total_tokens = 0
        cost_usd = 0.0

        async for chunk in stream_chat(
            model=bot["model"],
            messages=messages,
            temperature=bot["temperature"],
            max_tokens=bot["max_tokens"],
            workspace_id=workspace.id
        ):
            if chunk.type == "text":
                full_text += chunk.text
                yield f"data: {json.dumps({'text': chunk.text})}\n\n"
            elif chunk.type == "usage":
                total_tokens = chunk.total_tokens
                cost_usd = chunk.cost_usd

        # 5. Persist async (don't block response)
        latency_ms = int((time.time() - start_ms) * 1000)
        await db.execute("""
            INSERT INTO messages (conversation_id, role, content, tokens, sources, latency_ms)
            VALUES ((SELECT id FROM conversations WHERE session_id=$1 LIMIT 1),
                    'assistant', $2, $3, $4, $5)
        """, req.session_id, full_text, total_tokens,
            json.dumps(context_chunks), latency_ms)

        await db.execute("""
            INSERT INTO usage_events (workspace_id, event_type, tokens_used, cost_usd, model)
            VALUES ($1, 'chat_message', $2, $3, $4)
        """, workspace.id, total_tokens, cost_usd, bot["model"])

        yield f"data: {json.dumps({'done': True, 'tokens': total_tokens, 'cost': cost_usd})}\n\n"

    return StreamingResponse(
        event_stream(),
        media_type="text/event-stream",
        headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"}
    )

LiteLLM Service — services/llm.py

python

import litellm
from .encryption import decrypt_key
from ..db.client import db

# Cost per 1M tokens (input, output)
MODEL_COSTS = {
    "gpt-4o":            (5.00, 15.00),
    "gpt-4o-mini":       (0.15,  0.60),
    "claude-3-5-sonnet": (3.00, 15.00),
    "claude-3-haiku":    (0.25,  1.25),
    "gemini-1.5-pro":    (3.50, 10.50),
    "gemini-flash":      (0.075, 0.30),
}

async def stream_chat(model, messages, temperature, max_tokens, workspace_id):
    # Fetch decrypted API key for this workspace + provider
    provider = model.split("-")[0]  # 'gpt' → 'openai', 'claude' → 'anthropic'
    provider_map = {"gpt": "openai", "claude": "anthropic", "gemini": "google"}
    provider_name = provider_map.get(provider, provider)

    row = await db.fetchrow(
        "SELECT api_key_enc FROM ai_providers WHERE workspace_id=$1 AND provider=$2",
        workspace_id, provider_name
    )
    api_key = decrypt_key(row["api_key_enc"])

    response = await litellm.acompletion(
        model=model,
        messages=messages,
        temperature=temperature,
        max_tokens=max_tokens,
        api_key=api_key,
        stream=True
    )

    total_in = total_out = 0
    async for chunk in response:
        delta = chunk.choices[0].delta
        if delta.content:
            yield type("C", (), {"type": "text", "text": delta.content})()
        if hasattr(chunk, "usage") and chunk.usage:
            total_in = chunk.usage.prompt_tokens
            total_out = chunk.usage.completion_tokens

    ci, co = MODEL_COSTS.get(model, (5, 15))
    cost = (total_in * ci + total_out * co) / 1_000_000
    yield type("U", (), {"type": "usage", "total_tokens": total_in + total_out, "cost_usd": cost})()

Next.js — Consuming the stream (TypeScript)

typescript

// components/chatbot/ChatPreview.tsx
export async function sendMessage(
  chatbotId: string,
  message: string,
  history: Message[],
  onChunk: (text: string) => void,
  onDone: (usage: Usage) => void
) {
  const res = await fetch(`${API_URL}/chat`, {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      "Authorization": `Bearer ${await getAccessToken()}`,
    },
    body: JSON.stringify({ chatbot_id: chatbotId, session_id: getSessionId(), message, history }),
  });

  const reader = res.body!.getReader();
  const decoder = new TextDecoder();

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;
    const lines = decoder.decode(value).split("\n").filter(l => l.startsWith("data: "));
    for (const line of lines) {
      const payload = JSON.parse(line.slice(6));
      if (payload.text) onChunk(payload.text);
      if (payload.done) onDone({ tokens: payload.tokens, cost: payload.cost });
    }
  }
}

RAG Pipeline

Document ingestion → chunking → embedding → Qdrant storage. Retrieval at chat time injects top-K relevant chunks into the system prompt.

Ingestion Worker — workers/tasks.py

python

from celery import Celery
from qdrant_client import QdrantClient
from qdrant_client.models import PointStruct, Distance, VectorParams
from openai import AsyncOpenAI
import pdfplumber, re, uuid

app = Celery("tasks", broker="redis://localhost:6379")
qdrant = QdrantClient(url="https://your-qdrant.cloud", api_key="...")
openai = AsyncOpenAI()

def chunk_text(text: str, chunk_size=256, overlap=32) -> list[str]:
    """Split text into overlapping token-approximate chunks."""
    words = text.split()
    chunks, i = [], 0
    while i < len(words):
        chunk = " ".join(words[i : i + chunk_size])
        chunks.append(chunk)
        i += chunk_size - overlap
    return [c for c in chunks if len(c.strip()) > 20]

async def embed_texts(texts: list[str]) -> list[list[float]]:
    res = await openai.embeddings.create(
        model="text-embedding-3-small", input=texts
    )
    return [e.embedding for e in res.data]

def ensure_collection(workspace_id: str):
    col = f"ws_{workspace_id.replace('-','_')}"
    if col not in [c.name for c in qdrant.get_collections().collections]:
        qdrant.create_collection(col, vectors_config=VectorParams(
            size=1536, distance=Distance.COSINE
        ))
    return col

@app.task(bind=True, max_retries=3)
def ingest_document(self, doc_id: str, workspace_id: str, chatbot_id: str,
                    file_path: str, source_type: str):
    import asyncio
    asyncio.run(_ingest(doc_id, workspace_id, chatbot_id, file_path, source_type))

async def _ingest(doc_id, workspace_id, chatbot_id, file_path, source_type):
    from ..db.client import db

    await db.execute(
        "UPDATE knowledge_documents SET status='processing' WHERE id=$1", doc_id
    )
    try:
        # 1. Extract text
        if source_type == "pdf":
            with pdfplumber.open(file_path) as pdf:
                text = "\n".join(p.extract_text() or "" for p in pdf.pages)
        elif source_type == "url":
            import trafilatura
            downloaded = trafilatura.fetch_url(file_path)
            text = trafilatura.extract(downloaded) or ""
        else:
            text = open(file_path).read()

        # 2. Chunk
        chunks = chunk_text(re.sub(r'\s+', ' ', text))

        # 3. Embed (batch of 100)
        all_embeddings = []
        for i in range(0, len(chunks), 100):
            batch = await embed_texts(chunks[i:i+100])
            all_embeddings.extend(batch)

        # 4. Upsert to Qdrant
        col = ensure_collection(workspace_id)
        points = [
            PointStruct(
                id=str(uuid.uuid4()),
                vector=emb,
                payload={
                    "text": chunk,
                    "doc_id": doc_id,
                    "chatbot_id": chatbot_id,
                    "workspace_id": workspace_id,
                    "chunk_index": i,
                }
            )
            for i, (chunk, emb) in enumerate(zip(chunks, all_embeddings))
        ]
        qdrant.upsert(collection_name=col, points=points)

        await db.execute(
            "UPDATE knowledge_documents SET status='indexed', chunk_count=$1 WHERE id=$2",
            len(chunks), doc_id
        )
    except Exception as e:
        await db.execute(
            "UPDATE knowledge_documents SET status='error', error_msg=$1 WHERE id=$2",
            str(e), doc_id
        )

Retrieval — services/rag.py

python

from qdrant_client import QdrantClient
from qdrant_client.models import Filter, FieldCondition, MatchValue
from openai import AsyncOpenAI

qdrant = QdrantClient(url="https://your-qdrant.cloud", api_key="...")
openai = AsyncOpenAI()

async def retrieve_context(
    query: str,
    chatbot_id: str,
    workspace_id: str,
    top_k: int = 5,
    score_threshold: float = 0.72
) -> list[dict]:
    # Embed the query
    res = await openai.embeddings.create(
        model="text-embedding-3-small", input=query
    )
    q_vector = res.data[0].embedding

    col = f"ws_{workspace_id.replace('-','_')}"

    # Search with chatbot_id filter
    results = qdrant.search(
        collection_name=col,
        query_vector=q_vector,
        limit=top_k,
        score_threshold=score_threshold,
        query_filter=Filter(must=[
            FieldCondition(key="chatbot_id", match=MatchValue(value=chatbot_id))
        ])
    )

    return [
        {"text": r.payload["text"], "score": round(r.score, 3), "doc_id": r.payload["doc_id"]}
        for r in results
    ]

Embed Widget

Vanilla JS, zero dependencies, ~6KB gzipped. Injected via a single <script> tag. Self-contained shadow DOM to prevent CSS leakage.

javascript

// widget/src/widget.ts — compiled to widget/dist/widget.js
(function() {
  const config = window.ModelPilotConfig || {};
  const BOT_ID = config.botId || document.currentScript.dataset.botId;
  const API  = "https://api.modelpilot.ai/v1";
  let sessionId = localStorage.getItem("mp_session");
  if (!sessionId) {
    sessionId = crypto.randomUUID();
    localStorage.setItem("mp_session", sessionId);
  }

  // Fetch widget config from API
  async function init() {
    const res = await fetch(`${API}/widget/${BOT_ID}/config`);
    const cfg = await res.json();
    render(cfg);
  }

  function render(cfg) {
    const host = document.createElement("div");
    const shadow = host.attachShadow({ mode: "closed" });
    document.body.appendChild(host);

    shadow.innerHTML = `
      <style>
        :host { all: initial; font-family: system-ui; }
        #launcher {
          position: fixed; ${cfg.position === "bottom-left" ? "left" : "right"}: 20px;
          bottom: 20px; width: 52px; height: 52px; border-radius: 50%;
          background: ${cfg.accentColor}; cursor: pointer; display: flex;
          align-items: center; justify-content: center; font-size: 24px;
          box-shadow: 0 4px 20px rgba(0,0,0,0.18); z-index: 999999;
          border: none; transition: transform .2s;
        }
        #launcher:hover { transform: scale(1.08); }
        #window {
          position: fixed; ${cfg.position === "bottom-left" ? "left" : "right"}: 20px;
          bottom: 82px; width: 360px; height: 560px; border-radius: 18px;
          background: #fff; box-shadow: 0 12px 48px rgba(0,0,0,0.18);
          display: none; flex-direction: column; overflow: hidden; z-index: 999998;
        }
        #window.open { display: flex; }
        #header { background: ${cfg.accentColor}; padding: 14px 16px;
          color: white; font-weight: 700; font-size: 14px;
          display: flex; align-items: center; gap: 10px; }
        #messages { flex: 1; overflow-y: auto; padding: 14px;
          display: flex; flex-direction: column; gap: 10px; background: #f7f8fc; }
        .msg { max-width: 82%; padding: 10px 14px; border-radius: 12px;
          font-size: 13.5px; line-height: 1.5; }
        .user { align-self: flex-end; background: ${cfg.accentColor}; color: white;
          border-radius: 12px 12px 2px 12px; }
        .bot { align-self: flex-start; background: white; color: #09090b;
          border-radius: 12px 12px 12px 2px; box-shadow: 0 1px 4px rgba(0,0,0,.08); }
        #input-row { display: flex; gap: 8px; padding: 10px; border-top: 1px solid #f0f0f0; }
        #input { flex: 1; border: 1px solid #e4e4e7; border-radius: 9px;
          padding: 8px 12px; font-size: 13px; outline: none; }
        #send { background: ${cfg.accentColor}; color: white; border: none;
          border-radius: 9px; padding: 8px 14px; cursor: pointer; font-weight: 700; }
      </style>
      <button id="launcher">💬</button>
      <div id="window">
        <div id="header">🤖 ${cfg.botName || "Assistant"}</div>
        <div id="messages">
          <div class="msg bot">${cfg.greeting || "Hi! How can I help?"}</div>
        </div>
        <div id="input-row">
          <input id="input" placeholder="Type a message…" />
          <button id="send">↑</button>
        </div>
      </div>`;

    const launcher = shadow.getElementById("launcher");
    const win      = shadow.getElementById("window");
    const msgs     = shadow.getElementById("messages");
    const input    = shadow.getElementById("input");
    const send     = shadow.getElementById("send");
    let history = [];

    launcher.onclick = () => win.classList.toggle("open");

    async function sendMessage() {
      const text = input.value.trim();
      if (!text) return;
      input.value = "";

      addMsg("user", text);
      history.push({ role: "user", content: text });

      const botEl = addMsg("bot", "▌");  // streaming cursor
      let full = "";

      const res = await fetch(`${API}/widget/${BOT_ID}/chat`, {
        method: "POST",
        headers: { "Content-Type": "application/json" },
        body: JSON.stringify({ session_id: sessionId, message: text, history })
      });

      const reader = res.body.getReader();
      const dec = new TextDecoder();

      while (true) {
        const { done, value } = await reader.read();
        if (done) break;
        dec.decode(value).split("\n")
          .filter(l => l.startsWith("data: "))
          .forEach(l => {
            const p = JSON.parse(l.slice(6));
            if (p.text) { full += p.text; botEl.textContent = full + "▌"; }
            if (p.done) { botEl.textContent = full; }
          });
        msgs.scrollTop = msgs.scrollHeight;
      }
      history.push({ role: "assistant", content: full });
    }

    function addMsg(role, text) {
      const el = document.createElement("div");
      el.className = "msg " + role;
      el.textContent = text;
      msgs.appendChild(el);
      msgs.scrollTop = msgs.scrollHeight;
      return el;
    }

    send.onclick = sendMessage;
    input.onkeydown = e => e.key === "Enter" && sendMessage();
  }

  init();
})();

Auth & Multi-tenancy

Supabase handles JWT issuance and OAuth. FastAPI verifies JWTs and injects workspace context. Row Level Security on every table enforces isolation at the DB layer.

FastAPI JWT middleware — middleware/auth.py

python

from fastapi import Header, HTTPException, Depends
from jose import jwt, JWTError
import os
from ..db.client import db

SUPABASE_JWT_SECRET = os.environ["SUPABASE_JWT_SECRET"]

class WorkspaceCtx:
    user_id: str
    workspace_id: str
    role: str

async def get_current_workspace(
    authorization: str = Header(...)
) -> WorkspaceCtx:
    try:
        token = authorization.removeprefix("Bearer ")
        payload = jwt.decode(token, SUPABASE_JWT_SECRET, algorithms=["HS256"])
        user_id = payload["sub"]
    except JWTError:
        raise HTTPException(401, "Invalid token")

    # Load workspace membership (cached in Redis 60s)
    row = await db.fetchrow("""
        SELECT wm.workspace_id, wm.role, w.plan, w.message_quota
        FROM workspace_members wm
        JOIN workspaces w ON w.id = wm.workspace_id
        WHERE wm.user_id = $1 AND wm.joined_at IS NOT NULL
        ORDER BY wm.joined_at LIMIT 1
    """, user_id)

    if not row:
        raise HTTPException(403, "No workspace found")

    ctx = WorkspaceCtx()
    ctx.user_id = user_id
    ctx.workspace_id = str(row["workspace_id"])
    ctx.role = row["role"]
    ctx.plan = row["plan"]
    ctx.message_quota = row["message_quota"]
    return ctx

Supabase RLS Policies — SQL

sql

-- Enable RLS on all tables
ALTER TABLE chatbots           ENABLE ROW LEVEL SECURITY;
ALTER TABLE knowledge_documents ENABLE ROW LEVEL SECURITY;
ALTER TABLE conversations       ENABLE ROW LEVEL SECURITY;
ALTER TABLE messages            ENABLE ROW LEVEL SECURITY;
ALTER TABLE usage_events        ENABLE ROW LEVEL SECURITY;

-- Helper function: get user's workspace IDs
CREATE OR REPLACE FUNCTION auth.workspace_ids()
RETURNS uuid[] LANGUAGE sql STABLE AS $$
  SELECT array_agg(workspace_id)
  FROM workspace_members
  WHERE user_id = auth.uid() AND joined_at IS NOT NULL;
$$;

-- Chatbots: members can read, editors/admins can write
CREATE POLICY chatbots_select ON chatbots
  FOR SELECT USING (workspace_id = ANY(auth.workspace_ids()));

CREATE POLICY chatbots_insert ON chatbots
  FOR INSERT WITH CHECK (
    workspace_id IN (
      SELECT workspace_id FROM workspace_members
      WHERE user_id = auth.uid() AND role IN ('editor','admin')
    )
  );

CREATE POLICY chatbots_update ON chatbots
  FOR UPDATE USING (
    workspace_id IN (
      SELECT workspace_id FROM workspace_members
      WHERE user_id = auth.uid() AND role IN ('editor','admin')
    )
  );

-- Conversations: members can view only their workspace
CREATE POLICY conversations_select ON conversations
  FOR SELECT USING (workspace_id = ANY(auth.workspace_ids()));

-- Admins only: billing + provider keys
CREATE POLICY providers_admin ON ai_providers
  FOR ALL USING (
    workspace_id IN (
      SELECT workspace_id FROM workspace_members
      WHERE user_id = auth.uid() AND role = 'admin'
    )
  );

Next.js Auth middleware — middleware.ts

typescript

import { createMiddlewareClient } from "@supabase/auth-helpers-nextjs";
import { NextResponse } from "next/server";
import type { NextRequest } from "next/server";

export async function middleware(req: NextRequest) {
  const res  = NextResponse.next();
  const supabase = createMiddlewareClient({ req, res });
  const { data: { session } } = await supabase.auth.getSession();

  const isAuthPage = req.nextUrl.pathname.startsWith("/login");

  if (!session && !isAuthPage) {
    return NextResponse.redirect(new URL("/login", req.url));
  }
  if (session && isAuthPage) {
    return NextResponse.redirect(new URL("/", req.url));
  }
  return res;
}

export const config = {
  matcher: ["/((?!_next/static|_next/image|favicon|widget.js).*)"`]
};

Deployment

Frontend on Vercel, backend on Railway. Widget JS hosted on Cloudflare R2 for <50ms global delivery. All services deploy on git push.

Vercel

Next.js frontend · Edge network · Auto-SSL

Railway

FastAPI · Celery · Redis · Auto-deploy

Supabase

Postgres · Auth · Storage · Realtime

Qdrant Cloud

Managed vector DB · 1GB free tier

docker-compose.yml (Local Dev)

yaml

version: "3.9"
services:
  redis:
    image: redis:7-alpine
    ports: ["6379:6379"]

  qdrant:
    image: qdrant/qdrant:latest
    ports: ["6333:6333"]
    volumes: ["./qdrant_storage:/qdrant/storage"]

  api:
    build: ./apps/api
    ports: ["8000:8000"]
    env_file: .env
    depends_on: [redis, qdrant]
    command: uvicorn main:app --host 0.0.0.0 --port 8000 --reload

  worker:
    build: ./apps/api
    env_file: .env
    depends_on: [redis]
    command: celery -A workers.tasks worker --loglevel=info

Security Checklist

✓ Implement

AES-256 encrypt all API keys at rest

Row Level Security on all Supabase tables

Rate limiting per workspace (Redis)

Widget CORS: allow any origin (public)

API CORS: restrict to your domains only

Stripe webhook signature verification

Sentry for error monitoring (free tier)

HTTPS everywhere (Vercel + Railway enforce)

⚠ Don't Forget

Never log API keys or tokens to console

Monthly budget cap to prevent runaway costs

Input sanitization before sending to LLM

Rotate SUPABASE_JWT_SECRET quarterly

Soft-delete chatbots (don't destroy data)

Celery task idempotency (avoid duplicate chunks)

Environment Variables

bash

# .env.example

# Supabase
SUPABASE_URL=https://xxxx.supabase.co
SUPABASE_ANON_KEY=eyJ...
SUPABASE_SERVICE_KEY=eyJ...          # Never expose to client
SUPABASE_JWT_SECRET=your-jwt-secret

# Qdrant
QDRANT_URL=https://xxxx.qdrant.io
QDRANT_API_KEY=...

# OpenAI (for embeddings only)
OPENAI_API_KEY=sk-...

# Encryption key for stored provider keys (32 bytes)
ENCRYPTION_KEY=base64-encoded-32-byte-key

# Redis
REDIS_URL=redis://localhost:6379

# Stripe
STRIPE_SECRET_KEY=sk_live_...
STRIPE_WEBHOOK_SECRET=whsec_...
STRIPE_PRICE_ID_STARTER=price_...
STRIPE_PRICE_ID_PRO=price_...

# App
NEXT_PUBLIC_API_URL=https://api.modelpilot.ai/v1
NEXT_PUBLIC_SUPABASE_URL=https://xxxx.supabase.co
NEXT_PUBLIC_SUPABASE_ANON_KEY=eyJ...